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ABSTRACT. The symbol-based epistemology used in AI is contrasted with the 
constructivist, coherence epistemology promoted by cybernetics. The latter leads to 
bootstrapping knowledge representations, in which different parts of the system 
mutually support each other. Gordon Pask's entailment meshes are reviewed as a 
basic application of this approach, and then extended to entailment nets: directed 
graphs governed by the “bootstrapping axiom”, determining which concepts are to 
be distinguished or merged. This allows a constant restructuring of the conceptual 
network. Semantic networks and frame-like representations can be expressed in this 
scheme by introducing a basic ontology of node and link types. Entailment nets are 
then generalized to associative networks with weighted links. Learning algorithms 
are presented which can adapt the link strengths, based on the frequency with which 
links are selected by hypertext users. It is argued that such bootstrapping methods 
can be applied to make the World-Wide Web more intelligent, allowing it to self- 
organize and support inferences. 
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1. Introduction 


In my contribution to this special issue devoted to the memory of Gordon Pask, I wish 
to review my research on knowledge representation and knowledge acquisition. 
Although it was started independently, this research program in a number of ways 
parallels and extends Pask's work on Conversation Theory, and in particular its 
representation through entailment meshes and their implementation in the computer 
program THOUGHTSTICKER (Pask, 1975, 1976, 1980, 1984, 1990; Pask, & Gregory, 
1986). My work started around 1983 with the development of a “structural language” 
for representing the fundamental space-time structures of physics (Heylighen, 1990a). 
This primitive modelling scheme plays a role similar to Pask's (1984) “protolanguage” or 
“protologic” Lp (cf. Heylighen, 1992). An investigation into artificial intelligence (AI) 
and cognitive science made me understand that the main application of this language 
might lie in knowledge representation rather than in the foundations of physics. 
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However, the limitations of AI made me turn to cybernetics as a more general theoretical 
framework (Heylighen, 1987, 1990). 

I first came into contact with Gordon Pask through the conference “Self-Steering 
and Cognition in Complex Systems” (Heylighen, Rosseel & Demeyere, 1990) which I 
co-organized at the Free University of Brussels in 1987. As one of the founding fathers 
of cybernetics, Pask was invited to give a keynote speech (Pask, 1990). In his inimitable 
style, he reviewed his work on conversation theory and Lp. His manners of an 
archetypical British eccentric and his dandyish appearance, with bow-tie, cape and 
walking-stick, did not fail to impress the audience, myself included. At that moment, I 
did not yet understand the relevance of his approach for my own work, though. 

I had the chance to get better acquainted with the Paskian philosophy during a short 
stay at the University of Amsterdam in 1988-1989. There I worked for the research 
program “Support, Survival and Culture”, headed by Gerard de Zeeuw and Gordon 
Pask. The program's focus on computer-supported collaborative work incited me to 
implement my ideas into a prototype software system, the CONCEPTORGANIZER, 
which supports one or more users in exteriorizing their implicit knowledge in the form 
of a hypertext network of concepts (Heylighen, 1989, 1991a). Although I was not 
consciously influenced, the similarity in both intention and name between this program 
and Pask's THOUGHTSTICKER seems obvious in retrospect. 

My interest in the foundations of cybernetics and in conceptual networks led me to 
join Valentin Turchin and Cliff Joslyn in the creation of the Principia Cybernetica 
Project (Joslyn, Heylighen & Turchin, 1993). The project's aim is the computer- 
supported collaborative development of an evolutionary-cybernetic philosophy. Its 
results are presently implemented as a large hypertext net on the World-Wide Web 
(Heylighen, Joslyn & Turchin, 1997). Although some cyberneticists reacted critically to 
the project's foundational ambitions, it was welcomed enthusiastically by Gordon Pask. 
In his contribution to the project's first workshop, he noted that “a philosophy of 
Cybernetics, encapsulated in the [...] title, 'Principia Cybernetica’, is not only justifiable, 
but necessary and in this day and age, utterly essential” (Pask, 1991). In my own 
contribution, I sketched an extension of my work on the structural language to 
encompass semantic networks as a possible framework to structure the knowledge 
produced by the project (Heylighen, 1991b). 

A focus on the philosophical content as well as on the organizational and technical 
implementation of the project during the following years kept me from further 
developing my scheme for knowledge representation. However, in 1994, when I was 
joined by Johan Bollen, we started to work on a new application, an associative network 
of concepts that would self-organize or “learn” from the way it is used (Bollen & 
Heylighen, 1996, 1998). Although I knew his health was deteriorating, I had hoped to 
present this work to Gordon Pask during the 13th European Meeting of Cybernetics 
and Systems in Vienna (April 1996), where we were both chairing a symposium. 
Unfortunately, Gordon died a few weeks before the congress, and the meeting became 
the first occasion for those who had known him to reminisce and pay tribute to his 
work. 
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The present paper will review my on-going research on “bootstrapping” methods 
for knowledge representation, emphasizing the similarities (and differences) with Pask's 
approach. Although this review will mostly summarize ideas scattered over different 
papers (Heylighen, 1989, 1990a, 1990d, 1991a, 1991b, 1999; Bollen & Heylighen, 1996, 
1997, 1998; Heylighen & Bollen, 1996), some of the results included here have not been 
published before. 


2. Epistemology: correspondence versus coherence 


Both Pask's work on knowledge representation and my own are distinguished from the 
more traditional AI approach by their underlying epistemology. Most AI work 
implicitly assumes a correspondence epistemology, which sees knowledge as a simple 
mapping or “reflection” of the external world. Every conceptual object (symbol) in the 
knowing subject's model is supposed to correspond to one or more physical objects in 
the environment. The structure of the model can be seen as a homomorphic map, or an 
encoding, of the structure of outside reality. This epistemology, which has been called 
the “reflection-correspondence” theory by Turchin (1993), leads to a host of conceptual 
and practical problems (Bickhard & Terveen, 1995). 

The most pressing ones center around the origin and nature of the mapping from 
reality to its symbolic representation. Since a cognitive system has no access to reality 
(Kant's “Ding an Sich”) except through perceptions—which are already internal 
models—, how can it ever determine whether it uses a correct mapping? Another 
formulation of this difficulty is the symbol grounding problem (Harnad, 1990): how are 
the symbols, the elements of the model, “grounded” in the external reality which they 
are supposed to represent? This problem cannot be solved within the model itself. This 
follows from the “linguistic complementarity” principle (Löfgren, 1991), which 
generalizes classic epistemological restrictions such as the theorem of Gödel or the 
Heisenberg indeterminacy principle. It states that no language can fully describe its own 
description or interpretation processes. In other words, models cannot include a 
representation of the mapping that connects their symbols to their interpretations. 

Because there is no inherent procedure to determine a correct mapping, models in 
AI tend to be arbitrarily imposed by the system's programmer or designer. The model's 
foundations or building blocks, the symbols, are primitives which have to be accepted at 
face value, without formal justification. The model can be more or less adequate for the 
problem domain, but never complete, in the sense of covering all potentially relevant 
situations. As argued by van Brakel (1992), this 'problem of complete description’ 
generalizes the famous 'frame problem' in AI (Ford & Hayes, 1991). Incompleteness 
would not be a real obstacle if the models could adapt or learn, that is, extend their 
capabilities each time a problem is encountered. But the correspondence philosophy 
does not allow any simple way for a model to be changed. New symbols cannot be 
derived from those that are already there, since they are supposed to reflect outside 
phenomena to which the model does not have access. Introducing a new symbol must be 
done by the programmer, and requires a redefinition of the syntax and semantics of the 
model. Thus, models in AI tend to be static, absolutist and largely arbitrary in structure 
and contents. 


BOOTSTRAPPING KNOWLEDGE REPRESENTATIONS 4 


These problems have helped focus attention on an alternative epistemological 
position, constructivism, which is espoused by most cyberneticists, and emphasized in 
“second-order” cybernetics (von Foerster, 1996) and the theory of autopoiesis 
(Maturana & Varela, 1992). According to this philosophy, knowledge is not a passive 
mapping of outside objects, but an active construction by the subject. That construction 
is not supposed to reflect an objective reality, but to help the subject adapt or “fit in” to 
the world which it subjectively experiences. 

This means that the subject will try to build models which are coherent with the 
models which it already possesses, or which it receives through the senses or through 
communication with others. Since models are only compared with other models, the lack 
of access to exterior reality no longer constitutes an obstacle to further development. In 
such an epistemology, knowledge is not justified or “true” because of its correspondence 
with an outside reality, but because of its coherence with other pieces of knowledge 
(Rescher, 1973; Thagard, 1989). The problem remains to specify what “coherence” 
precisely means: mere consistency is clearly not sufficient, since any collection of 
unrelated propositions is logically consistent. 

Model construction can be seen as a trial-and-error process, where different 
variations are generated, but only those variations are retained which “fit in” with the 
rest of the experiential material. Thus, the process is selectionist (Cziko, 1995; Bickhard 
& Ter Veen, 1995) rather than instructionist: instead of instructing the subject on how to 
build a model, the (inside or outside) environment merely helps it to select the most 
“fit” models among all of the subject's autonomously generated trials. 

Many constructivists tend to emphasize the role of social interaction in this 
selection process: those models will be retained about which there is a consensus within 
the community. This is the social constructivist position, which is popular especially in 
the social sciences and humanities. Psychologists like von Glasersfeld (1984) and Piaget 
(1937), on the other hand, emphasize the individual subject, who tries to find coherence 
between his or her different models and perceptions. The selectionists inspired by 
Popper's (1959) evolutionary epistemology and theory of falsification, finally, 
emphasize the role of the outside environment in weeding out inadequate models. For 
obvious reasons, this more “realist” position is most popular in the natural sciences. 
My own philosophy is pragmatic, and acknowledges the combined role of individual, 
social and physical (“objective”) factors in the selection of knowledge (Heylighen, 1993, 
1997). 

Although constructivists usually also accept some form of a coherence view of 
truth, few people have proposed concrete mechanisms that show how the dynamic 
process of construction (variation) can result in the static requirement of coherence 
(selective retention). The originality of Pask's Conversation Theory is that it provides a 
detailed formal model of such mechanisms. The metaphor of conversation is aptly 
chosen to describe such a process of cognitive interaction, in which concepts are 
exchanged, combined and recombined (the construction phase), with the aim of achieving 
agreement about shared meanings (the coherence phase). Although the conversation 
metaphor seems to favor the social construction of knowledge, Pask is quick to point 
out that his theory applies equally well to interactions between different roles or 
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perspectives (“p-individuals”) within a single individual. I have suggested elsewhere 
(Heylighen, 1990c) that the conversational perspective could even be extended to 
interactions between observers and objects. Pask might not have disagreed with this 
generalization, since he saw his THOUGHTSTICKER program as a way to turn a computer 
into a conversational partner. 

To describe my own solution to the problem, however, I prefer the metaphor of 
bootstrapping (cf. Heylighen, 1990a, 1990d, 1992). As said, the problem with 
correspondence epistemologies is that they lack grounding: everything is built on top of 
the symbols, which constitute the atoms of meaning; yet, the symbols themselves are 
not supported. The advantage of a coherence epistemology is that there is no need for a 
fixed ground or foundation on which to build models: coherence is a two-way relation. In 
other words, coherent concepts support each other. The dynamic equivalent of this 
mutual support relation may be called “bootstrapping”: Model A can be used to help 
construct model B, while B is used to help construct A. It is as if I am pulling myself up 
by my own bootstraps: while my arms (A) pull up my boots (B), my boots at the same 
time—through my legs, back and shoulders—push up my arms. The net effect is that 
more (complexity, meaning, quality, ...) is produced out of less. This is the hallmark of 
self-organization: the creation of structure without need for external intervention. 

I will now show how this bootstrapping philosophy can be applied to the practical 
problem of knowledge representation, first by reviewing Pask's entailment meshes, then 
by extending the underlying formalism to my own entailment nets. 


3. Entailment Meshes and the THOUGHTSTICKER program 


During the development of Pask's Conversation Theory, it turned out that many of its 
statements could be succinctly expressed by graphical schemes that were called 
“entailment meshes”. These informal representations were later interpreted as 
expressions in a formal language of distinction and coherence, which Pask called Lp. The 
manipulation of Lp expressions was facilitated by an interactive computer program 
called THOUGHTSTICKER (Pask, 1984; Pask & Gregory, 1986). As the name implies, the 
main aim of THOUGHTSTICKER is to help users exteriorize their thoughts in the form of 
a stable and explicit knowledge representation. (The same aim underlies my own 
CONCEPTORGANIZER prototype (Heylighen, 1989, 1991a). I will here summarize the 
representation of knowledge through entailment meshes, and the rules that are embedded 
in the THOUGHTSTICKER software for manipulating and eliciting these meshes. 

The basic elements of an entailment mesh are called topics. These topics are 
connected through coherences. A coherence is a collection or cluster of topics which are 
so interrelated that the meaning of any topic of the coherence can be derived from the 
meaning of the relation among the other topics in the cluster. In other words, the topics 
in a coherence entail or define each other. A simple example of a coherence is the cluster 
<pen, paper, writing>. This means that we can somehow start from the concepts of 
writing and paper and produce the concept of an instrument that allows you to write 
on paper: a pen. Complementarily, we can start from the notions of pen and paper and 
derive from them the activity you do when applying a pen to paper: writing. 
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pen 
writing he 
table \ 
(a) (b) 


Fig. 1: (a) a simple entailment mesh consisting of two coherences overlapping on the topic writing ; (b) 
a 'prune' unfoldment of the same mesh from the point of view of the topic pen. 


The same topic, e.g. writing, can belong to different coherences, for example: <pen, 
paper, writing> and <table, chair, writing> (see Fig. 1a). In other words, coherences 
can overlap in one or more topics. An entailment mesh is then a complex of overlapping 
coherences, that is, a collection of topics and coherences such that every topic belongs 
to at least one coherence (see Fig. 2). Every topic in an entailment mesh should be 
entailed by, or unambiguously derivable, from the other topics in the mesh. This 
derivation can be represented in THOUGHTSTICKER by the operation called 'Prune'. 
Prune produces an unfoldment of the mesh from the perspective of the concept you 
want to derive. For example, Fig. 1b shows a pruning of the mesh in Fig. 1 from the 
point of view of pen. With symbols this derivation could be represented as: pen + 
{writing, paper}, writing — {table, chair}. 


Fig. 2: a more complex entailment mesh 


The main function of the Prune operation is to discover structural ambiguities or 
conflicts in a mesh. An ambiguity arises when different topics are derived in the same 
way. In that case, there is no way to distinguish the topics within the mesh. Consider 
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for example the two coherences <pen, paper, writing> and <pencil, paper, writing>. 
In this case, it is not clear which of the two topics pencil or pen should be produced 
from paper and writing: pen + {writing, paper} and pencil — {writing, paper} both 
hold. This poses a problem for the knowledge representation, which can be resolved in 
one of the following ways: 


e merge topics: the ambiguous topics may be really equivalent, in the sense that either 
they are synonyms (e.g. persons and people), or they are two tokens (e.g. pen and 
pencil) standing for the same basic concept (e.g. writing device). The resolution is 
to merge the topics, thus producing the single new coherence: <writing device, 
paper, writing> 

e add topic: if the topics are really distinct, this must be reflected in the entailment 
mesh by changing the coherences that define them. The simplest way to do that is to 
add a topic to one of the two coherences. For example, one could replace <pencil, 
paper, writing> by <pencil, paper, writing, erasable>. The derivation of pencil 
from its coherence cluster is now different from the derivation of pen, and thus the 
ambiguity is resolved. 

e split topic: another way to create distinction where there is none is to split or 
“bifurcate” one of the topics occurring in both coherences into two distinct topics. 
For example, writing could be split into writing letters and writing notes, 
resulting in the two coherences <pen, paper, writing letters> and <pencil, paper, 
writing notes>. Since the two new topics derive from the same underlying idea, 
they remain related in THOUGHTSTICKER by a relation of analogy. 

e merge coherences: another way to eliminate the ambiguity is to replace the two 
coherences by one new coherence including all topics, e.g. <pen, pencil, paper, 
writing>. Now, pen and pencil have each a different derivation. 


Another type of ambiguity can arise when coherences are nested, i.e. when one 
coherence, e.g. <pen, pencil, ball-point>, is a subset of another coherence, e.g. <pen, 
pencil, ball-point, paper, writing>. In that case, it is not clear how to derive a topic 
belonging to both coherencies, e.g. is it pen + {writing, paper, ball-point, pencil} or 
pen < {ball-point, pencil}? Such a nested construction is illegal in THOUGHTSTICKER. 
The way to resolve it is to “condense” the topics of the inner cluster (<pen, pencil, 
ball-point>) into a new, “generalized” concept, e.g. writing device. This leaves a single 
new cluster, <writing device, paper, writing> (see Fig. 3). 


BOOTSTRAPPING KNOWLEDGE REPRESENTATIONS 8 


writing 


writing device 


writing 


pen 
pencil 


ball-point 


Fig. 3: a condensation of the subcoherence <pen, pencil, ball-point> into the new, more general topic 
writing device. 


THOUGHTSTICKER will assist the user in representing her implicit knowledge more 
explicitly, and in searching for conflicts and proposing resolution. The latter will elicit 
new knowledge by pointing to gaps and ambiguities in the knowledge that is already 
there. Moreover, THOUGHTSTICKER can directly suggest possible expansions of the 
knowledge base through the 'Saturate' operation. This will produce new candidate 
coherences that would not create structural ambiguities. These suggested combinations 
of topics may be completely random, or restricted to certain ranges of topics defined by 
the user. It is up to the user to decide whether the proposed cluster is meaningful or not. 
Thus, THOUGHTSTICKER constantly interacts or “converses” with the user, helping to 
construct an ever more complete and well-balanced system of concepts. 


4. Bootstrapping in graphs 


4.1. ENTAILMENT MESHES AS NON-DIRECTED GRAPHS 


Although THOUGHTSTICKER is more flexible than traditional AI knowledge 
representation systems, its underlying Lp logic has a number of inherent shortcomings. 
Lp's structure is too loose to support “inferences” in the stronger sense of logical 
deductions showing the truth or falsity of certain expressions relative to other 
expressions. Entailment meshes also do not have any obvious connection with the— 
admittedly limited—knowledge we have about cognition and brain functioning. The 
most important limitation, however, seems to me the inherent non-directionality of its 
entailment meshes. (It must be noted that in Conversation Theory entailments meshes 
are derived from the more complex relational networks and entailment networks, which 
are directional (Pask, 1975). However, the whole thrust of the approach is to simplify 
these rather unwieldy structures until just an entailment mesh is left. Some kind of 
directionality can be recovered in a mesh by the “Prune” operation, but this 
directionality is purely relative, dependent on the specific topic that is being unfolded.) 
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Pask's motivation for non-directionality was to replace the traditional hierarchical 
structures—where higher order concepts are always reduced to (non-grounded) 
primitives—by a heterarchical (“bootstrapping”) structure—where concepts mutually 
produce each other. Directionality does not imply hierarchy or reducibility, though. 
When I started working on my structural language (Heylighen, 1990a), I made it 
intrinsically directional in order to model processes and their “arrow of time”, not 
hierarchical dependencies. The advantage of directed structures is that they encompass 
non-directed structures as a special case, whereas the converse is not true. In the 
mathematics of relations, non-directionality corresponds to symmetry. A relation R is 
symmetric if for every pair for which the relation holds, (a, b) € R, the inverse pair also 
satisfies the relation: (b, a) € R. General relations are asymmetric, though, which means 
that the inverse pair can either be there, or not. So, a non-directed structure or graph can 
always be created by limiting the allowed relations to symmetric ones. On the other 
hand, there is no clear way to create a directed structure if only symmetric relations are 
available. 

Let me show how such a relational representation generalizes the structure 
underlying entailment meshes. Two topics a, b in a coherence are connected by the 
reflexive and symmetric relation C: “belongs to the same coherence as”, which we might 
also read as “is coherent with”. Symmetry means that for alla and b, a Cb = b Ca. 
Moreover, within one coherence the relation is also transitive: if a C b and b Cc, then a 
C c. However, the relation is no longer transitive if we consider topics belonging to 
different but overlapping coherences. Consider the two coherences <a, b, c> and <c, d, 
e>. In that case we have a C c, and ¢ C d, but not a C d. The combination of reflexivity, 
symmetry and transitivity defines an equivalence relation. The individual coherence 
clusters could then be viewed as the equivalence classes induced by the relation C. Since 
C is only locally transitive, its equivalence classes can overlap, instead of partitioning 
the set of all topics into separate clusters, like a full equivalence relation would. 

The representation of an entailment mesh through its collection of coherences is 
thus equivalent to its representation by the symmetric relation C defined on the set of 
all topics T= {a, b, c, ...}: C c Tx T. The fundamental requirement of the avoidance of 
structural ambiguity can then be formulated in the following way. Every topic a is 
derived from the other topics it is coherent with: {x + a | a C x} = I(a). Ambiguity is the 
situation where we have two distinct topics a £ b but such that I(a) = I(b). 


4.2. ENTAILMENT NETS AND THE BOOTSTRAPPING AXIOM 


Let us generalize the above scheme by considering a general, directed, i.e. asymmetric 
relation. I will call this relation “entailment” and its instances “links”, denoting them by 
right arrows: — . Instead of “topics”, I will call the elements connected by the 
entailment relation “nodes”, or “concepts”. The graphical representation of nodes and 
links will take the form of an unlabelled directed graph, with the only constraint that 
there can be at most one link between any two nodes (see Fig. 3 for an example). In 
analogy to the entailment meshes, we may call such graphs entailment nets. 
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Fig. 3: an entailment net consisting of nodes (a, b, ¢, ...) connected by links. I(a) denotes the set of 
input nodes of a, O(a) the set of its output nodes. 


For every node or concept a, there will now be two relevant sets, depending on the 
direction in which we follow the relation, i.e. depending on whether we look at the 
concepts that a entails, or that are entailed by it. We can thus define the input and 
output sets of a concept a (see Fig. 3): 


Input: (a) = {x |x > a} 
Output : O(a) = {x |a>x } 


As in an entailment mesh, the meaning (definition, distinction) of a can be interpreted as 
derived or produced from these two sets. The requirement of ambiguity avoidance can 
be formulated most generally as the following bootstrapping axiom (Heylighen, 1990a,d, 
199 1b): 


two concepts are distinct if and only if their input and/or output sets are distinct: 
a £b © I(a) £ I (b) and/or O(a) + O(b) 


Thus, the concept a is unambiguously defined or determined by the other concepts it is 
connected with by the entailment relation. This definition is “bootstrapping” because 
the elements in I(a) and I(b) are of course themselves only distinguished by virtue of 
their own connections with distinct elements, including the original a and b. It is not 
recursive in the conventional sense, because there is no privileged set of primitive 
elements in terms of which all others are defined, as in a traditional symbol-based 
knowledge representation. Note that the axiom implies that concepts with empty input 
and output sets (i.e. independent, disconnected nodes) cannot be distinguished at all. 
Although the bootstrapping axiom is formulated as a static, logical requirement, its 
practical value lies in the dynamic construction of new concepts and entailments. 
Suppose that a user is developing a knowledge representation consisting of different 
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concepts together with their entailment relations. A computer program like 
CONCEPTORGANIZER would then constantly analyse the resulting network looking for 
ambiguities, i.e. nodes that are different yet have the same input and output sets. If it 
would find such an ambiguity, it would suggest a number of methods for resolving it, 
analogous to the methods proposed by THOUGHTSTICKER. These obviously divide in 
two main strategies: either merge the ambiguous nodes if they are really equivalent, or 
differentiate their respective input and output sets by adding, splitting or deleting nodes 
(Heylighen, 1991a). 

The THOUGHTSTICKER strategy of merging coherences could be translated in an 
entailment net by adding a two-way entailment between the ambiguous nodes. For 
example, if I(a) = I(b) and O(a) = O(b) then adding the entailments b — a anda > b 
would add a to I(b) and O(b), and add b to I(a) and O(b), thus distinguishing the 
respective input and output sets. The two-way entailment could be interpreted as a 
“similar, yet different” relation. (However, if the entailment would also be reflexive, i.e. 
include the links a — a and b — b, then we would be back to ambiguity since the input 
and output sets would again be identical.) 

The richer structure of entailment nets makes it less likely to encounter ambiguities 
in an arbitrary net, since both input and output sets must be identical to have an 
ambiguity. However, there are different types of “approximate” ambiguities, which do 
not strictly transgress the bootstrapping axiom, but where the difference is not very 
large. For example, the two input sets of a pair of nodes could be identical, while their 
output sets differ only in one element. In such cases, it is possible that the single 
difference is accidental, and that the nodes should better be merged. Or, it is likely that 
there is more than one difference, and therefore the computer system will suggest to the 
user to add a difference in the input sets which reflects the existing difference in the 
output sets. 


4.3. NODE INTEGRATION 


A system based on entailment nets would also allow an equivalent of 
THOUGHTSTICKER's “condense” operation for creating a more general concept, by 
integrating a number of more specific concepts. This may be called node integration, as 
distinguished from the node identification (merging) suggested in the case of ambiguity 
(cf. Bakker, 1987; Stokman & de Vries, 1988). 

A cluster of concepts that are strictly distinct according to the bootstrapping axiom 


may still be indistinguishable from outside the cluster, because they have the same input 
and output links with nodes outside the cluster: consider a set of nodes A = {aj |1= 1, 


.., D}, with the property that for all 1 < i,j <n: I(aj)\A= I(aj) \ A and O(a;) \ A= 
O(aj) \ A (where “\” stands for the set-theoretic difference). In that case, the nodes 
belonging to A can be integrated and replaced by a single new node A, resulting in a 
much simplified graph, without affecting the distinctions between nodes outside of A. 
See Fig. 4 for an example. 
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(a) (b) 
Fig. 4: (a) a cluster {a], a2, a3} of nodes that are externally indistinguishable; (b) the graph resulting 


from the replacement of that cluster by the single node A. 


Again, the property of external indistinguishability of a cluster can be relaxed to 
approximate indistinguishability. This means that the external input and output sets of 
the cluster elements are not completely identical, but have most elements in common. 
This would be the case with concepts that have a “family resemblance”, with many 
common properties, but some idiosyncratic differences. One way to model this is to 
introduce a kind of conditional probability measure P of finding a particular node in b's 
input (respectively, output) set, given that that node already belongs to a's input, 
(respectively, output) set: 


#(I(a) 0 I(b)) 


Epa) #I(a) 


P°(b, a) can be defined in the same way, with I(x) replaced by O(x). We have the 
property that P/(b, a)=1 < I(a) c I(b). The bootstrapping axiom is then equivalent 
to: 


a = b © Pb, a) = Pb, a) = Pa, b) = Pa, b) = 1 
We can now define a general similarity measure between a and b as: 
S(a, b) = (Pb, a) + PO(b, a) + Pa, b) +P%a, b))/4 


The bootstrapping axiom then says that S(a, b) = 1 if and only if a and b are identical. 
Smaller values, 0 < S(a, b) < 1, imply that a and b are similar to some degree, but not 
identical. If we choose a particular treshold value S; (say, St = 0.8), then we can consider 
all pairs of nodes (a, b) for which S(a, b) > St as candidates for integration. For larger 
collections of mutually similar nodes, we can apply different statistical clustering 
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techniques to determine the best overall subset for integration. This general approach is 
similar to the algorithm proposed by Fisher (1987), which uses conditional probabilities 
for clustering together concepts on the basis of the number of properties (in an 
entailment net, this would mean input and output elements) they have in common. Such 
conceptual clustering is a basic form of what is usually called “machine learning”, 
“knowledge discovery” or “data mining”, i.e. the automatic retrieval of regularities in 
large sets of interrelated data. 


5. Bootstrapping semantic networks 


A shortcoming of both Pask's entailment meshes and the entailment nets as I have 
presented them until now is that there is no clear interpretation of the “entailment” that 
keeps the concepts together. This very abstract relation tells us that concepts somehow 
depend on each other, but it does not tell us how. 

Another knowledge representation scheme from AI, the semantic network 
(Brachman, 1977; Shastri, 1988; Sowa, 1991), is based on similar nets of interdependent 
concepts, but here the dependencies are classified into distinct types with specific 
interpretations. For example, different types of relations might specify that a concept a 
“causes” another concept b, that a “is a part of” b, or that a “is a special case of” b. The 
motivation underlying semantic networks is that concepts get their meaning through the 
semantic relations they have with other concepts. This is similar to the bootstrapping 
philosophy underlying entailments meshes and entailment nets. 

However, semantic networks do not solve the “symbol grounding” problem; they 
merely push it to another level of description. Although the nodes are supposed to 
mutually define each other, the relations or link types do not. Like the symbols in a 
more traditional representation, the link types used in a semantic network are 
primitives, which are more or less arbitrarily imposed by the system designer. In 
practice, this has produced confusion, with different researchers using different link 
types, or interpreting what seem to be the same link types in a different manner. 
Moreover, since the number of link types is not a priori limited, there is a tendency to 
solve problems by creating a new ad hoc link type each time it is not clear how a 
particular relation can be expressed using the existing link types. Because of this 
semantic confusion, the empirical verification of semantic network-based theories of 
cognition has produced ambiguous results, sometimes seeming to confirm the theory, 
sometimes seeming to contradict it. Yet, a well-chosen set of link types can produce a 
kind of intuitive recognition, which may help the user to understand knowledge 
formulated as a semantic network more easily than knowledge expressed in a more 
abstract or sparse formalism. 

For this reason I have tried to reconstruct a semantic network-like structure within 
my entailment nets (Heylighen, 1991b), while keeping to the fundamental requirement 
that all meaning or distinction only be justified by bootstrapping within the system, not 
by appealing to external reality as an invisible arbiter. The trick to bootstrap link types 
is simply to reduce them to node types, which themselves are reduced to nodes. Since 
all nodes are subjected to the bootstrapping axiom, this indirectly subjects link types to 
bootstrapping as well. Let me explain this procedure in more detail. 
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5.1. ENTAILMENT AS GENERALIZED “IF... THEN” 


The first step is to propose a general interpretation of the existing nodes and links in an 
entailment net. Until now, I have merely sketched the representation formalism and 
pointed out its analogies with Pask's Lp expressions, without clarifying what its 
elements concretely stand for. Since nodes are defined by the way they are 
distinguished, it is natural to interpret them as basic cognitive distinctions (cf. Spencer 
Brown, 1969; Heylighen, 1990b,d), that is, as classes of phenomena that are separated 
or distinguished by an observer from all other phenomena that do not belong to the 
class. It can be argued that all perceptual and cognitive entities, such as concepts, 
patterns, categories, or experiences, are distinctions. They are all triggered by certain 
phenomena but not by others. They thus separate the universe of phenomena into two 
complementary classes: those that fit the concept (the “indication” or “marked” side of 
the distinction, according to Spencer Brown, 1969), and those that do not. A distinction 
can be seen as the most fundamental unit of cognition. 

Whereas the distinctions cut up the phenomenal universe, the entailment relation 
connects everything back together. If distinction a entails distinction b, we might say 
that given a we can somehow expect to see b as well. This is a kind of a generalized or 
weakened form of “if a, then b”. In order to recover more of the properties of the 
traditional “if...then” in logic (implication), we should at least demand transitivity of the 
relation. Indeed, “if a, then b” and “if b, then c” together imply “if a, then ec”. The 
entailment relation does not have any a priori properties such as transitivity, symmetry 
or reflexivity. Yet, we can always choose to focus attention on that part of the 
entailment relation that is transitive, and to interpret it as an “‘if...then”. (One way to 
mathematically generate the transitive part is to take the intersection of an entailment 
relation E with its second power: Erans = E A E2). 

The interpretation of this “if a, then b” between distinctions is that the class a is 
somehow subsumed in or followed by the class b. For example, if a phenomenon is a 
dog, then it is also a mammal: dog > mammal. It means that a phenomenon denoted 
by the first concept cannot be present or actual, without a phenomenon denoted by the 
second one being (simultaneously) or becoming (afterwards) actual. As the example 
shows, a primary instantiation of entailment is the relation between an instance or 
subclass and the more general class to which it belongs. The more general class (e.g. 
mammal) can be seen as grouping a number of related, more concrete concepts (e.g. dog, 
cat, mouse, deer, etc.). With such an interpretation we can reinterpret the input of a 
concept x, I(x), as its “extension”, i.e. the set of its instances, and its output O(x) as its 
“intension”, i.e. the conjunction of its defining features. The meaning of x, as expressed 
by the bootstrapping axiom, can then be interpreted as determined by the disjunction of 
its input elements, and the conjunction of its output elements. 

This interpretation suggests yet another heuristic for distinguishing “similar” nodes. 
Suppose you have two nodes a and b, such that I(a) c I(b), while O(b) c O(a). In that 
case the system might suggest to the user to create a link from a to b, a — b, assuming 
that a concept a with a smaller extension and a larger intension than b is likely to be a 


15 HEYLIGHEN 


special case or subcategory of b. This case can be expressed using the conditional 
probability formulas as: P/(b, a) = Pa, b) = 1, PO(b, a) < 1, P/(a, b) < 1. 


5.2. ONTOLOGICAL DISTINCTIONS AS BASIC NODE TYPES 


If we follow the entailment relation “upward” between classes we will reach ever more 
abstract or more general distinctions, e.g. dog > carnivore > mammal — vertebrate 
— animal — organism — object. The most abstract classes can be interpreted as the 
philosopher's “categories”, the most basic or universal distinctions which underly our 
understanding of the universe. These fundamental concepts can be said to define an 
“ontology”, i.e. a theory of the most fundamental categories of existence, such as time, 
space, matter, truth, cause and effect. These ontological distinctions can be interpreted 
as basic node types, which allow a classification of other, more concrete nodes. 

Note that the word “ontology” has recently received a broader and more practical 
meaning in the domain of knowledge representation, where it denotes the complete 
system of concepts, with their definitions and relationships, that support a shared 
conceptualization of a domain (Uschold and Gruninger, 1996). The original 
philosophical meaning, which I use here, is closer to what Uschold and Gruninger call a 
“meta-ontology”, i.e. a set of allowable types of concepts. What they call an “ontology” 
would correspond to the whole system of concepts in an entailment network. In that 
sense, CONCEPTORGANIZER could be viewed as a tool for building ontologies. As an 
illustration, the Principia Cybernetica Project has developed an ontology of basic 
systems theoretic concepts in the form of a semantic network (Joslyn, Heylighen & 
Bollen, 1997). 

In order to develop a skeleton (meta-)ontology, or list of fundamental node types, I 
have proposed two basic dimensions of distinction: stability (or time) and generality 
(Heylighen, 1991b). Stability distinguishes more from less temporally variable 
phenomena, while generality distinguishes abstract, universal classes from their concrete, 
particular instances. Although these distinction dimensions can in principle have 
continuous values, it is simpler to consider discrete classes. Stability can be seen as 
having three possible values: transitional (i.e. with no distinguished duration), temporary 
(with a finite duration), and stable (with an indefinite duration), and generality two: 
particular - general. The combination of these 3 x 2 values leads to 6 basic types of 


distinction (see table 1). 


Table 1: 6 basic node types, generated as combinations of 3 values for the time dimension, and 2 values 


for the generality dimension. 
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For example, an object is a distinction that is stable (it is not supposed to appear or 
disappear while we are considering it), and particular (it is concrete, there is only one of 
it). A property is a distinction that is general (several phenomena may be denoted by it, 
it represents a common feature), and temporary (it may appear or disappear, but 
normally it remains present during a finite time interval). An event is instantaneous (it 
appears and disappears within one moment), and particular (it does not denote a class of 
similar phenomena, but a specific instance). Events can be seen as discrete changes, as 
transitions in which something was present and suddenly is no longer present, or was 
absent and suddenly appears. Changes can be seen as classes of events or as 
“transformations” (in the mathematical sense of a function mapping a set of initial states 
onto a set of subsequent states). 

Some examples will further clarify this classification. “Dog” is a class, and so is 
“mammal”. My own dog “Fido”, on the other hand, is an object that belongs to the class 
of dogs. “Barking” is a property that Fido may or may not have at any particular 
moment. “Being able to bark”, on the other hand, is a class that mostly overlaps with 
the class of “dog”. “Fido is barking” is a situation, a temporary state of affairs, which in 
this case involves a particular object. “The 2nd World War” is another situation. “The 
beginning of the 2nd World War’, on the other hand, is an event. “Starting a war” is a 
general change. 

These ontological distinctions are meant to be an aid to analysis, not an absolute 
statement about the structure either of the world or of the formal system. They are not 
at the same level of importance as the bootstrapping axiom. Ideally, ontological 
distinctions should be wholly derivable from the formal structure of an entailment net, 
rather than being entered by the system designer. One possible avenue to achieve this is 
to use a generalized principle of mathematical closure to create higher order distinctions 
(see Heylighen, 1990d, and Heylighen, 1990a for an example application). At present, 
though, ontological distinctions should be seen merely as a checklist (cf. Heylighen, 
1991a) of distinctions that are likely to be relevant. Their use is not mandatory. Indeed, 
it is easy to conceive of a concept in an entailment network that cannot be readily 
classified along either of the ontological dimensions. 

Consider for example the concept of “the American democracy”. Does this concept 
refer to a particular institution that exists in one country, or to a more general method of 
governing that could be applied to many instances? Does it represent the situation at 
this moment in time, or a system that may survive indefinitely? Different people are 
likely to answer these questions differently in different contexts. Yet the concept of 
“American democracy” is sufficiently clear that it can be meaningfully used in 
communicating and representing knowledge. It can therefore be represented in an 
entailment network, perhaps with entailments linking it to “the American senate” and 
“the Bill of Rights”, but without needing to be linked to one of the ontological concepts 
of “object”, “class”, “situation”, etc. Still, the creation of more specialised distinctions 
such as “the American democracy as method of governing” (class) or “the American 
democracy as an institution during the Reagan years” (situation) may be helpful to 
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elucidate semantic confusions between different (p-) individuals who use the same 
words but speak about different things. 


5.3. BASIC LINK TYPES 


With these generic node types we can now produce a number of corresponding link 
types by considering the possible combinations of nodes between which an “if... then” 
link can exist. There is one constraint on these combinations, though: a more “invariant” 
(more stable or more general) distinction can never entail a less invariant one. Otherwise, 
the second would be present each type the first one is present, contradicting the 
hypothesis that it is less invariant than the first one. For example, a class cannot entail 
an object and a situation cannot entail an event. Yet, two concepts with the same type 
of invariance (e.g. two objects) can be connected by an entailment relation. The 
remaining possible combinations are summarized in figure 5. 


general specific 


stable w 


Impli 


Instance_of 


A_ 
S T 
| Has_Property | 
temporary 
property situation 
Produces = 
- Instance_Of 
transitional Precedes 
change 


Fig. 5: Link types derived from the allowed combinations between node types; the straight arrows 


represent entailment from one type to another (more invariant) one, the circular arrows represent 
entailment from a concept of one type to a concept of the same type 


Let us discuss the most important link types in this scheme. When an object a entails a 
class b, a > b, then a is an “instance of’ b. When a class a entails another class b, then a 
is “a kind of” b. The union of the relations “a kind of” and “instance of” may be called 
“is a”. This is the most popular link type used in traditional semantic networks. For 
example, “a dog is a mammal” and “Fido is a dog”. Note that the traditional “has 
property” relation cannot be used to link an object with a property. If you wish the 
express the fact that an object has a property (which because of its temporary character 
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cannot be expressed as a class), e.g. “Fido is tired”, then you must create a situation 
which involves (only) the object “Fido” and has the property “tired”. 

When an object a always entails the presence of another object b, then b must 
belong to or be a part of a. (If two objects entail each other, they are either identical or 
so tightly bound that they cannot be separated.) Note that the direction of the “has 
part” entailment may seem counter-intuitive, since it runs from a “larger” whole to a 
“smaller” part, whereas the “is a” entailment runs from a “smaller” class to a “larger” 
superclass, thus putting the conventional hierarchy on its head. The (physical) size of 
an object has nothing to do with the (logical) size of a class, though. One should beware 
of the intuitive tendency to see “is a kind of” as analogous to “is a part of’, and 
remember that in terms of entailment it is equivalent to the inverse relation “has as 
part”. 

The entailment from property to class is a simple implication from temporary to 
stable features. E.g. if something falls it has a mass: falling — mass, yet objects that 
have mass (permanently) are not permanently falling. Property entailing property is 
again a simple implication (or succession), now between temporary features, e.g. falling 
— moving. 

The application of the entailment relation to stable distinctions follows relatively 
closely the logical implication. However, when applied to distinctions that have an 
element of time, the “then” part of “if...then” can be interpreted as an indication of 
temporary succession. This makes it possible to express the flow of time in the 
formalism. The cognitive interpretation is that the main function of knowledge is 
anticipation: trying to foresee the future on the basis of the present (Heylighen, 1993; 
Turchin, 1993). The simplest case is the entailment between events, which can be 
interpreted as “precedes or is simultaneous with”. With this interpretation, my original 
structural language for the foundations of physics (Heylighen, 1990a) corresponds to 
that subset of the present representation scheme in which only events and precedence 
relations are considered. As shown in (Heylighen, 1990a), the network structures 
present in this subset are sufficient to reconstruct the primitive geometry of relativistic 
space-time. 

The generalization of an event to a class of similar events corresponds to a change. 
When a change a entails another change b, then a and b “covary” and since a is prior to 
b, we can interpret a as the cause of b. For example, “heating water (change) causes 
(entailment) boiling (change)”. There is no direct way to express by a link that an event 
a causes another event b, except by noting that a precedes b, and that a and b are two 
instances of general change phenomena which are related by a causal relation. (This is 
not a shortcoming of the present knowledge representation, but a general property of 
causation, which can only be established after observing a repeated covariation.) A 
transitional phenomenon can also entail a more long lasting phenomenon. For example, 
“boiling (change) produces vapor (property or class)”. Similarly, an event (e.g. “the 
1939 invasion of Poland”) can precede a situation (e.g. “the 2nd World War”). 
Sometimes this precedence can also be interpreted as a production, if event and ensuing 
situation can be interpreted as instances of a change producing a property (e.g. 
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“invasions produce war”), but this is in general not the case (e.g. “the release of 'Gone 
with the Wind' preceded the 2nd World War”) (see Fig. 6). 


particular 


change situation 


invasion 


Word War II 


invasion of Poland 


‘Gone with Wind' 


Fig. 6: an entailment network representing the semantic relations “the invasion of Poland produced 
World War IT”, and “'Gone with the Wind' preceded World War II”. Bold face denotes concepts that 
function as node types, italics denotes the most abstract category of node types. 


The fact that a short-lived phenomenon (e.g. an event, change or situation) can 
entail a more long-lived one (e.g. a property, class or object), but not the other way 
around, is an illustration of the “principle of asymmetric transitions” governing systems 
evolution: a transition from an unstable configuration to a stable one is possible, but the 
converse is not (Heylighen, 1992b). Of course, we can formulate expressions such as 
“the gun (stable) produced a shot (transitional)”, but this is just a shorthand for saying 
that a particular event (transitional) involving a gun, a finger and a movement of the gun's 
trigger, produced another event, a shot. This example illustrates how the simple 
constraints inherent in the present ontology (and in entailment nets in general) force the 
user to make implicit—but necessary—distinctions explicit, thus avoiding ambiguities 
and gaps in the knowledge system. 

In conclusion, the advantage of this representation scheme is that most of the 
intuitive and often used semantic categories (objects, classes, causality, whole-part 
relations, temporal precedence, etc.) can be directly expressed in it, using a simple and 
uniform format. The resulting representation appears general, consistent and 
unambiguous. This allows us to reduce a complicated set of semantic categories to an 
extremely simple and flexible formal structure. The disadvantage is that many more links 
are needed in order to reduce various link types to nodes than if the links could be 
simply labeled by their types. However, the burden of keeping track of all the links will 
normally rest on the computer, and not on the user, who could work with a higher level 
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representation using typed links, perhaps mixed with untyped links for those cases 
where the type may not be clear. All these links will be reduced by the computer to 
their generic, untyped form in order to test for ambiguity according to the bootstrapping 
axiom. 


5.4. FRAMES AND INHERITANCE 


Although the scheme above incorporates the most frequently used link types, it can 
never incorporate all possible types. Any two-place predicate or relation could be made 
into a link type, for example: “is the father of”, “is greater than”, “sits to the left of”. 
This freedom will quickly overburden any semantic network scheme. Another AI 
representation scheme, a frame, extends the semantic network representation by 
allowing custom-made relations to be associated with different types of concepts. A 
frame consists of a central concept related to other concepts by attribute-value pairs 
(also called “slots and fillers”). For example, the concept car has attributes (slots) such 
as color, brand, model, and year. A particular instance of that concept, such as my 
own car, will have particular values (fillers) for these respective attributes, such as grey, 
Honda, Civic and 1992. The values of an attribute are in general concepts themselves 
and thus the attribute plays the role of a typed link connecting nodes in a semantic 
network. 

The unrestricted proliferation of attribute-value pairs is kept in check by the 
mechanism of inheritance: concepts inherit most of their attributes (and sometimes also 
values) from the concepts higher up in the “is a” hierarchy, that is, from the classes to 
which they belong as subclasses or as instances. For example, a car is a vehicle, and 
therefore car will inherit a number of attributes such as number of wheels or type of 
fuel from the vehicle category. Vehicle itself will inherit attributes such as size and 
weight from the object category. This has the advantage that each attribute only needs 
to be represented once, in the concept situated at the highest level of the hierarchy 
where it is needed. Thus, the number of attributes or link types is controlled by 
associating each of them with one or a few high level concepts. 

Such a “frame-like” knowledge representation can be easily expressed in an 
entailment net. Suppose I wish to express the fact that my car, which is an instance of 
the class car, has the value grey for its attribute color. Fig. 7 shows how this typical 
attribute-value pair can be represented with untyped entailment links between nodes 
typed through the categorical distinctions introduced earlier. The “is a” link between my 
car and car is obviously an entailment between an object and a class (see Fig. 5). The 
statement that every car has an attribute color is equivalent to the statement that the 
class car is a subclass of, and therefore entails, the class of colored things. Thus, the 
entailment link goes from car to color. Similarly my car, which is an object, belongs to 
the class of grey things which is itself a subclass of the class of colored things. Thus, the 
scheme linking an instance to its class and to an attribute-value pair can be reduced in 
essence to a distinction graph with two times two parallel arrows, in a configuration 
resembling what is called a “commutative diagram” in mathematics. 
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object class 


car 


my car grey 


Fig. 7: representation of a frame (car) with attribute (color) and value (grey) for an instance (my car), as a 
commutative diagram (thick arrows) in an entailment network. 


Since “if...then” entailments are supposed to be transitive, inheritance of attributes 
comes automatically. Whether my car gets its attribute number of wheels through the 
class car or through the higher class vehicle should not make a difference. In both cases, 
a commutative diagram can be drawn, with an entailment path running from my car to 
number of wheels, either through car or through vehicle. The inheritance of values in a 
frame has a more subtle feature, though, the default mechanism: a concept inherits its 
values from the class to which it belongs, unless there are values that are directly 
attached to the concept. The values filling the slots at the level of the concept override 
those filling the same slots at the higher levels from which it normally inherits. 
Inheritance of values only takes place if the values are not specified at the level of the 
concept itself. 

The traditional example of inheritance by default is the concept of penguin, which 
belongs to the class of birds, from which it inherits properties such as being warm- 
blooded, and laying eggs. However, by default the class of birds also has the property of 
being able to fly, which the penguin does not have. This is represented by having a 
specific attribute-value pair (can_fly?, false) attached to the concept penguin, while 
other subclasses of the bird class, such as robins or sparrows simply inherit their 
attribute value pair (can_fly?, true) from the parent class. This can be represented in an 
entailment net by the following simple graph (Fig. 8): 
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Fig. 8: representation by means of entailment links of a default inheritance (birds can fly) 
being overwritten for the exceptional case of “penguin” . 


In order to interpret this entailment network correctly, the requirement of 
transitivity for the entailment relation must be relaxed, so that the direct entailment 
penguin — cannot fly can override the indirect path penguin — bird, bird — can fly. 
The general rule could be that if two entailment paths from the same node lead to 
contradictory results, the shortest path is to be preferred. 

This somewhat ad hoc interpretation may be justified if we consider a more general 
interpretation of an entailment a > b, where it expresses a kind of “expectation” of b, 
given a, rather than “certitude” of b, given a. Expectations can come with different 
“strengths” or degrees of certainty. These may be expressed as a kind of conditional 
probability of b, given a: P(b | a). If we assume that typical entailment links have a 
strength of 1 or slightly less, paths of entailments will have a strength that is the 
product of the strengths of the individual links. This will be lower than or equal to the 
strength of the weakest link. Therefore, if the link from penguin to bird has maximal 
strength, P(bird | penguin) = 1, while P(can fly | bird) = 0.9, the combined path P(can 
fly | penguin) = 0.9, which is overriden by the stronger connection P (cannot 
fly | penguin) = 1. We will here not go into further detail about such calculations, but 
discuss an application of entailment nets which uses link strengths in a more general 
way. 


6. Learning Webs 


6.1. ASSOCIATIVE NETWORKS 


When we consider links with strengths continuously varying between 0 and 1 (or even 
-1 and 1 if we want to express negative or inhibitory links), the interpretation must shift 
from a generalized “if...then” relation, to the weaker “is associated with” connection. 
This brings us from “semantic” to “associative” networks (Heylighen, 2001). 
Associative networks are in principle more general and more flexible, allowing the 
expression of different “fuzzy”, “intuitive” or even “subconscious” relations between 
concepts. Such networks have been regularly suggested as models of how the brain 


works. They are similar to the presently popular “neural” networks, except that the 
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latter are typically used as directed, information processing systems, which are given a 
certain pattern of stimuli as input and are supposed to produce the correct response to 
that pattern as output. In the present “bootstrapping” perspective, there is no overall 
direction or sequence leading from inputs to outputs; there are only nodes linked to each 
other by associations, in such a way that they are coherent with each other and with the 
user's understanding of the knowledge domain. 

The very weak requirement of “associativity” allows virtually any pair of concepts 
to be linked, if only with a very small link strength. According to the bootstrapping 
axiom, if everything is linked to everything, then all nodes become uniformly 
indistinguishable. However, there is a simple way to generalize the bootstrapping axiom 
to associative networks by defining a similarity measure that takes into account 
continuous link strengths, not just the binary property of having a link (strength = 1) or 
not (strength = 0). High values of the similarity measure between two nodes could then 
be seen as indications of ambiguity, that may be resolved through merging, 
differentiating or integrating (clustering) the nodes. 

In section 4.3 similarity between nodes was defined as an average of the different 
“conditional probabilities” P/ and PO for the input and output sets. If we represent the 
strength of the link from x to y by /(x, y), then P/ can be defined more generally by the 
following formula: 


$ U(x, a).1(x,b) 


P! b,a — xel(a) 
en > (x, a) 


xel(a) 


(PO can be defined by a similar formula where the positions of x and a (or b) have been 
switched, and summation is done over O(a).) This formula reduces to the one of section 
4.3 in the binary case where the only possible values are /(x, y) = 1 & xe I(y), and 
I(x, y)=0 xg [(y). 

In principle, associative networks could be created by the same type of knowledge 
elicitation techniques underlying THOUGHTSTICKER or CONCEPTORGANIZER, where a 
user enters a number of concepts and links and is prompted by the system to add 
further links and concepts under the main constraint of avoiding ambiguity. These links 
must then be attributed some variable degree of strength. However, it is in practice 
impossible to let users realistically estimate a strength value for each of the huge number 
of possible links. Rocha (1991) has suggested a method to “fuzzify” conversation 
theory, by calculating continously varying conceptual distances between nodes in an 
entailment mesh, on the basis of the number of linked nodes they share, but this 
approach has never been fully worked out. 

In psychology, rudimentary associative networks have been created through 
experiments in which subjects were given a word (say “cat”), and were asked which 
other word first came to mind (e.g. “dog”, “mouse”, or “milk”). The more often a certain 
word b is given in response to the cue word a, the stronger the association from a to b 
(Heylighen, 2001). Since this approach usually only finds a small number of 
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associations for any given word, association strengths for links between other words are 
calculated by taking into account indirect associations (e.g. knowing the strengths of 
“dog — cat” and “cat mouse” would allow one to calculate the strength of “dog > 
mouse”). Note that such associations are in general asymmetric. For example, when cued 
with “penguin” the probability that you would say “bird” is not so small, whereas the 
probability to respond with “penguin”, when cued with “bird” is virtually zero. This 
methodology, however, requires a lot of work from designers and users, and is only 
useful for simple, well-known items like common words. 


6.2. ALGORITHMS FOR LEARNING HYPERTEXTS 


My collaborator Johan Bollen and I have developed a more efficient way to create 
associative networks, by applying a bootstrapping philosophy to hypertext navigation 
(Bollen & Heylighen, 1996, 1997, 1998). The hypertext paradigm provides a very 
simple and natural interface for representing any type of knowledge network to a user 
(Heylighen, 1991a). A document in a hypertext may contain the description in natural 
language or other media of a concept, while the associations of that concept to other 
concepts are represented by hypertext links within the document's text or graphics. The 
selection of a link by the user calls up the linked document, thus allowing the user to 
“browse” or “navigate” through the network. The problem with present hypertexts is 
the same one as with semantic networks and other knowledge representations: the 
system of nodes and links is created in an ad hoc way by the system designer, without 
being justified by either an underlying theory, or an empirical method for testing the 
effectiveness of the resulting network. This typically results in labyrinthine, “spaghetti- 
like” meshes of interconnected data, so that the user quickly gets lost in hyperspace. 

The method we developed allows an associative hypertext network to “self- 
organize” into a simpler, more meaningful, and more easily usable network. The term 
“self-organization” is appropriate to the degree that there is no external programmer or 
designer deciding which node to link to which other node: better linking patterns emerge 
spontaneously. The existing links bootstrap new links into existence, which in turn 
change the existing link patterns. The information used to create new links is not internal 
to the network, though: it derives from the collective actions of the different users. In 
that sense one might say that the network adapts or “learns” from the way it is used. 

The algorithms for such a learning web are very simple. Every potential link is 
assigned a certain “strength”. For a given node a, only the links with the highest strength 
are actualized, i.e. are visible to the user. Within the node, these links are ordered by 
strength, so that the user will encounter the strongest link first. There are three separate 
learning rules for adapting the strengths. 

1) Each time an existing link, say a = b, is chosen by the user, its strength is 
increased. Thus, the strength of a link becomes a reflection of the frequency with which 
it is used by hypertext navigators. This rather obvious rule can only consolidate links 
that are already available within the node. In that sense, it functions as a selector of 
strong connections. However, it cannot actualize new links, since these are not 
accessible to the user. Therefore we need complementary rules that generate novelty or 
variation. 
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2) A user might follow an indirect connection between two nodes, say a > b, b > ¢. 
In that case the potential link a > ¢ increases its strength. This is a weak form of 
transitivity. It opens up an unlimited realm of new links. Indeed, one or several increases 
in strength of a — c may be sufficient to make the potential link actual. The user can 
now directly select a > ce, and from there perhaps ¢ — d. This increases the strength of 
the potential link a —> d, which may in turn become actual, providing a starting point for 
an eventual further link a > e, and so on. Eventually, an indefinitely extended path may 
thus be replaced by a single link a — z. Of course, this assumes that a sufficient number 
of users effectively follow that path. Otherwise it will not be able to overcome the 
competition from paths chosen by other users, which will also increase their strengths. 
The underlying principle is that the paths that are most popular, i.e. followed most 
often, will eventually be replaced by direct links, thus minimizing the average number of 
links a user must follow in order to reach his or her preferred destination. 

3) A similar rule can be used to implement a weak form of symmetry. When a user 
chooses a link a > b, implying that there exists some association between the nodes a 
and b, we may assume that this also implies some association between b and a. 
Therefore, the reverse link b > a gets a strength increase. This symmetry rule on its 
own is much more limited than transitivity, since it can only actualize a single new link 
for each existing link. 

However, the collective effect of symmetry and transitivity is much more powerful 
than that of any single rule. For example, consider two links a; > b, a2 > b. The fact 
that a; and az point to the same node seems to indicate that a; and az have something in 
common, i.e. are related in some way. However, none of the rules will directly generate a 
link between a; and ap. Yet, the repeated selection of the link a2 —> b may actualize the 
link b > az by symmetry. The repeated selection of the already existing link a; > b 
followed by this new link can then actualize the link a; > a2 through transitivity. 
Similar scenarios can be conceived for different orientations or different combinations of 
the links. 

A remaining issue is the relative importance of the three above rules. In other 
words, how large should the increase in strength be for each of the rules? If we choose 
unity (1) to be the bonus given by the first rule, there are two remaining parameters or 
degrees of freedom: ¢ is the bonus for transitivity, s for symmetry. Since the direct 
selection of a link by a user seems a more reliable indication of its usefulness than an 
indirect selection, we assume ź < 1 , s < 1. The actual values will determine the efficiency 
of the learning process, but it seems that this matter cannot be settled by pure 
theoretical reasoning. 


6.3. EXPERIMENTAL RESULTS 


In order to test these ideas in practice we have set up two experiments. We built a 
network consisting of 150 nodes, corresponding to the 150 most frequent nouns of the 
English language. Every node was assigned 10 links to other nodes. These links were 
randomly selected from the 149 remaining nodes to initialize the web, but would then 
evolve according to the above learning rules (with t = 0.5 and s = 0.3). We made the web 
available on the Internet, and invited volunteers to browse through it, selecting those 


BOOTSTRAPPING KNOWLEDGE REPRESENTATIONS 26 


links from a given node which seemed somehow most related to it. For example, if the 
start node represented the noun “dog”, a user would choose a link to an associated word, 
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such as “cat”, “animal”, or “fur”, but not to a totally unrelated word, such as 
“mathematics”. Of course, in the beginning of the experiment, there would be very few 
good associations available in the lists of 10 random words, and users might have to be 
satisfied with a rather weak association, such as “meat”. However, when reaching the 
node “meat”, they might be able to select there another association, such as “carnivore”. 
Through transitivity, a new link to “carnivore” might then appear in the node “dog”, 
displacing the weakest link in the list, while providing a much better association than the 
previously best one, “meat”. 

Although there were some unexpected side effects, such as the 'attractor effect' (the 
tendency for a a group of related nodes to develop links only to other members of the 
group, so that users entering that group of nodes became practically unable to leave it), 
the development of the associative network was surprisingly quick and efficient (Bollen 
& Heylighen, 1996). After only 2500 link selections (out of 22500 potential links) both 
experimental networks had achieved a fairly well-organised structure in which most 
nodes had been connected to large clusters of related words. This may be illustrated by a 
typical example of how connections are gradually introduced and rewarded until their 
strength reaches an equilibrium value (Table 2). The position of these associated words 
shifted upwards in the list until they reached a position that best seemed to reflect their 
relative strength. 


KNOWLEDGE 

0 200 800 4000 
trade education education education 
view experience experience experience 
health example development research 
theory theory theory development 
face training research mind 
book development example life 

line history life theory 
world view training training 
side situation order thought 
government work effect interest 


Table 1: self-organization of the list of 10 strongest links from the word “knowledge”, in different stages: 
initial random linking pattern, after 200 steps, after 800 steps, and after 4000 steps. A step corresponds 
to a user selecting a link on one of the 150 nodes, in a web that evolves according to the direct, transitive 
and symmetric learning rules. 


The net result of the experiment was a 150 x 150 matrix of association strengths, 
which reflected fairly well the most important intuitive associations existing among the 
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concepts. For example, a cluster analysis performed on the matrix produced 9 general 
categories (Time, Space, Movement, Control, Cognition, Intimacy, Vitality, Society, 
Office), grouping most of the related words in a single class. 

We are now trying to determine to what degree these results from our learning web 
correlate with different word associations derived by other means (e.g. free association 
experiments, or letting people judge the degree of synonymity). We also plan to test the 
usefulness of the self-organization, by checking in how far users find knowledge more 
effectively in a self-organized network, as compared to a network that did not undergo 
learning. This can be done by measuring the average number of steps needed to find a 
node, and the average time needed to choose a link. We are further considering additional 
learning rules, such as similarity (nodes with a high similarity measure for their input and 
output links would get stronger cross-connections), that may make learning more 
effective. Although this research is still in its initial stage, and will need much empirical 
testing to confirm its usefulness, it seems like a very promising approach to quickly and 
easily develop complex associative networks that are more adequate than hypertexts 
built manually. 


7. Towards an Intelligent Web 


Apart from simple knowledge elicitation, the most obvious practical application of the 
various network models for knowledge structuring which we have reviewed lies in the 
World-Wide Web, the hypermedia interface to the information stored on the global 
Internet computer network. As said, the hypertext organization maps directly onto the 
directed graphs used in entailment nets or semantic networks. The web is a perfect 
example of a distributed and collectively constructed system of knowledge. The wealth 
of information provided by the millions of web documents, however, is offset by the 
almost total lack of structure in the way these documents are linked. This makes it far 
from trivial to find the precise information one is looking for, even when that 
information is within easy reach. 

The structuring algorithms which we discussed could be used to make the web 
simpler, more efficient, less redundant, and more complete, by pointing out lacking or 
superfluous links and nodes, and by suggesting better connection patterns. In particular 
the associative learning algorithms we sketched could be applied rather straightforwardly 
to the web as it now exists (Bollen & Heylighen, 1996; Heylighen & Bollen, 1996; 
Heylighen, 1999, 2001), resulting in a dramatic reduction in the average number of links 
a user needs to traverse in order to find a specific item. The proposed node merging and 
clustering techniques may be useful in creating automatic indexes or review documents, 
that group links to similar documents in a single new page (see Kleinberg, 1998, for an 
excellent example of how linking patterns can be used to create high-quality clusters of 
related pages). More generally, they should be able to continuously check the coherence 
and completeness of the web. If the program finds contradictions or gaps, it would try 
to situate the persons most likely to understand the issue (e.g. the authors or active 
users of the related documents), and elicit the missing knowledge from them in order to 
fill the gap by creating a new node (Heylighen & Bollen, 1996). 
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The semantic network organization too should be able to simplify web browsing by 
providing a simple and unified set of link types, so that users can be more selective in 
which links they need to explore. For example, if you are interested in the general class 
to which a concept belongs, it would be meaningless to spend time exploring its 
instances. The recent extension of the HTML language that underlies web hypertext to 
the new XML language is intended to support the creation of shared ontologies that 
describe basic categories of web nodes and links. 

While such a bootstrapping structuration of the web would help a user navigate 
through hyperspace, entailment nets may support information retrieval in an even more 
direct way. The directionality of entailment nets allows non-trivial inferences. Such 
inferences can be used to answer queries. Queries can be formulated either in a well- 
structured, formal or in an associative, fuzzy way (cf. Heylighen, 1991a), depending on 
how clear the problem is for the user. Formal queries are aimed at determining the 
presence of a particular semantic or entailment relation between two or more concepts. 
For example, “Can a penguin fly?”, is a “yes-no” query that needs to be resolved by 
determining the presence or absence of an entailment from penguin to can fly. An 
open-ended query like “Which birds cannot fly?” should produce a list of all concepts 
that entail both bird and cannot fly. Associative queries, on the other hand, do not ask 
for the presence of specific relations, but for the concepts that are in the most general 
way associated with the concepts the user has in mind, e.g. bird, ice, fish, cute. 

Both types of queries can be tackled in a knowledge network by the mechanism of 
spreading activation (Jones, 1986; Salton & Buckley, 1988; Chen & Ng, 1995): nodes or 
concepts that are linked to the concepts in the query are “activated”. The activation 
spreads from those nodes through their links to neighbouring nodes, and the nodes 
which have received the highest activation are brought forward as candidate answers to 
the query. If none of the proposals are acceptable, those that seem closest to the answer 
are again activated and used as sources for a new process of spreading. This process is 
repeated, with the activation moving in parallel from node to node via their links, until a 
satisfactory solution is found. 

In the case of associative queries, the only constraint on spreading activation is the 
strength of the intervening links: the activation passed through a link is proportional to 
its strength. The activation arriving in a node is the weighted sum of the activations 
arriving through all input links of that node. If the associative network is represented by 
a matrix of link strengths, spreading activation can be implemented by repeatedly 
multiplying an input vector representing the initial activation for all nodes in the 
network (typically with values of 1 for the query terms and 0 for all the others) with the 
matrix, and then selecting the nodes with the highest activation values from the different 
output vectors. (Chen & Ng, 1995, explore the usefulness of some other spreading 
activation algorithms for concept retrieval). 

We have implemented such a spreading activation program that uses the associative 
data resulting from our learning web experiment (Bollen & Heylighen, 1996b). The 
program often manages to mimic the “intuitive” reactions of a human subject trying to 
guess a word from various clues. For example, the input of the clue words “control” and 
“society” produces the word “government” as most highly activated, while the words 
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“building”, “work” and “paper” produce “office”. This is similar to the way thoughts 
diffuse in the brain, moving along intuitive, fuzzy pathways, rather than retrieving exact 
matches like traditional search engines. Such “inferences” could obviously never have 
been achieved through logical deductions, since there is no way in which “office” could 
have been defined by a boolean combination of the query terms above. 

In the case of formal queries, spreading activation will not be a continuously 
diffusing intensity, but a discrete state of activation which can only follow certain paths. 
For example, a query looking for birds that cannot fly should not follow links of the 
“has part” or “causes” type, but only of the “is a” type. The typical implementation in 
a semantic network will activate the concepts in the query and let the activation follow 
all possible links of the right type until the activated paths intersect. This is quite 
inefficient for proving negative connections, though, since all possible paths need to be 
explored between penguin and can fly in order to show that none exist. For such 
situations, it seems better to allow negative entailments or “inhibitory” links, of the 
type “if penguin, then not (can fly)”. 

It seems that the introduction of these different types of flexible inference and 
discovery mechanisms could turn the rapidly developing World-Wide Web from a huge, 
static repository into an active processor and creator of knowledge (Heylighen & Bollen, 
1996). The best metaphor for capturing the collective intelligence (Heylighen, 1999) 
formed by millions of users interacting with such a self-organizing, “thinking” web may 
be the one of a global brain (Russell, 1995). Although many issues still need to be 
resolved, work is starting in different quarters to turn this science-fiction-like vision into 
a concrete reality (Goertzel & Pritchard, 1997; Mayer-Kress & Barczys, 1995; 
Heylighen & Bollen, 1996; Heylighen, 1999, 2001). 

As a true visionary, Gordon Pask had already envisaged a self-organizing network 
at the planetary scale well before the World-Wide Web was created (Scott, 1982). This 
is not so surprising if we take into account that his conversational systems and the 
present vision of an intelligent web share their view of knowledge as a collective 
construction striving to achieve coherence, rather than a mapping of external objects. By 
abandoning the correspondence epistemology and its reliance on fixed primitives, 
bootstrapping approaches open the way to a truly flexible, adaptive and creative 
knowledge system. Of course, the systems sketched here are still in their infancy, and 
need to be thoroughly tested under diverse circumstances, and implemented on a 
sufficiently large scale to show their practical usefulness. This will obviously require a 
very large effort. I hope that the work of Gordon Pask, myself and our colleagues will 
provide sufficient inspiration for other researchers to take up these challenges. 
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